Skip to content

fix(run): emit session_error for session failures#81

Merged
byapparov merged 3 commits into
mainfrom
fix/session-error-json
Jun 22, 2026
Merged

fix(run): emit session_error for session failures#81
byapparov merged 3 commits into
mainfrom
fix/session-error-json

Conversation

@byapparov

Copy link
Copy Markdown
Contributor

Summary

  • emit structured session_error when the primary session publishes session.error
  • classify stored CLI NamedError objects without dropping data.message / data.statusCode
  • add coverage for stored auth/API error classification and v1 schema emission

Refs #63.

Validation

  • cd packages/cli && bun test test/cli/classify-session-error.test.ts test/cli/run-schema-v1.test.ts test/cli/run-session-error-exit.test.ts --timeout 30000
  • cd packages/cli && bun run typecheck

@byapparov byapparov self-assigned this Jun 13, 2026
@aictrl-dev

aictrl-dev Bot commented Jun 17, 2026

Copy link
Copy Markdown

I'll start by exploring the repository and understanding the PR changes.Let me get the PR diff and the full file contents.Let me get the surrounding context in run.ts and understand the emit function signature.

@byapparov byapparov added this to the Enterprise Observability milestone Jun 22, 2026
Comment thread packages/cli/src/cli/cmd/run.ts Outdated
Comment thread packages/cli/src/cli/cmd/run.ts Outdated
Comment thread packages/cli/test/cli/run-schema-v1.test.ts Outdated
@aictrl-dev

aictrl-dev Bot commented Jun 22, 2026

Copy link
Copy Markdown

Code review

Verdict: Looks good — only minor / nit comments below. · 🔴 0 · 🟠 0 · 🟡 2 · ⚪ 1 · 0/3 resolved

  • 🟡 packages/cli/src/cli/cmd/run.ts:575-579 — session_error wiring lacks a behavioral test
  • 🟡 packages/cli/src/cli/cmd/run.ts:575-581 — session_error + error both emitted; confirm no double-count
  • packages/cli/test/cli/run-schema-v1.test.ts:49 — Brittle hardcoded 1400-char window in structural test
🤖 Fix all 3 open findings with your agent
Fix the following code review findings on aictrl-dev/cli PR #81 (head branch).
Run the relevant tests/linters after each change.

1. packages/cli/src/cli/cmd/run.ts:575-579 — session_error wiring lacks a behavioral test
   Detail: The new `emit("session_error", …)` emission on the primary `session.error` path is verified only by `run-schema-v1.test.ts`, which asserts that the *source text* of run.ts contains certain substrings (`classifySessionError(props.error)`, `emit("session_error"`, `emit("error"`). That grep-style assertion passes even if `classifySessionError` is not imported, if the emit key is mistyped, or if the surrounding control flow is broken — it never drives a real `session.error` and observes the emitted payload. Consider a behavioral test (stub/observe `emit` and assert the actual `session_error` event is produced with the right reason/code/message) so this wiring can't silently regress.
2. packages/cli/src/cli/cmd/run.ts:575-581 — session_error + error both emitted; confirm no double-count
   Detail: For primary `session.error`s, both the new structured `session_error` event and the existing generic `error` event are now emitted (and `UI.error` may still print afterward). If any consumer/observer subscribes to both channels, the same failure can be counted or reported twice. If dual emission is intentional (structured channel vs. legacy channel), a short comment noting that would help future readers; otherwise consider gating the legacy `error`/`UI.error` path when `session_error` is handled.
3. packages/cli/test/cli/run-schema-v1.test.ts:49 — Brittle hardcoded 1400-char window in structural test
   Detail: `source.slice(idx, idx + 1400)` anchors the assertion window to a magic number. If the `session.error` block grows past 1400 chars (more conditions, longer payloads), a target string can slide outside the window and the `toContain` checks either miss the intended region or force the magic number to be bumped on every change. Prefer anchoring the window to a stable boundary (e.g. slice to the matching closing brace / next event handler) or assert against the full source so the test degrades predictably.
📋 Out-of-diff findings (3)
Sev Location Finding
🟡 packages/cli/src/cli/cmd/run.ts:575-579 session_error wiring lacks a behavioral test
🟡 packages/cli/src/cli/cmd/run.ts:575-581 session_error + error both emitted; confirm no double-count
packages/cli/test/cli/run-schema-v1.test.ts:49 Brittle hardcoded 1400-char window in structural test

Reviewed 4 files · 0 inline · view all 3 findings ↗


aictrl · AI code review for fast-moving teams · aictrl.dev

- Add behavioral test (run-session-error-emit.test.ts): spawns the CLI
  with --format json against a failing model and asserts the session_error
  JSON event is emitted with a classified reason/message. Closes the
  source-text-grep gap flagged by the bot.
- Fix brittle 1400-char window in run-schema-v1.test.ts: anchor the
  assertion window to the next event handler boundary instead of a magic
  character count, matching the pattern used in run-session-error-exit.test.ts.
- Add clarifying comment in run.ts session.error handler explaining that
  session_error (structured/classified) and error (raw/legacy) are
  intentionally distinct channels serving different consumers, not a
  double-count bug.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_0187MsfK1upr6K2BKVbmaebQ
@byapparov

Copy link
Copy Markdown
Contributor Author

Review response — PR #81

Triaged 3 findings (2x 🟡 minor, 1x ⚪ nit). All addressed in commit 58e48e420.

Issues addressed (pushed to this PR)

  • session_error wiring lacks a behavioral testpackages/cli/test/cli/run-session-error-emit.test.ts (new file): spawns the CLI with --format json against a nonexistent model, collects stdout JSON lines, and asserts a session_error event is present with a valid classified reason and non-empty message. This drives the real emission path end-to-end; source-text grep can't catch a mistyped event key or broken control flow. (commit 58e48e420)

  • Brittle hardcoded 1400-char windowpackages/cli/test/cli/run-schema-v1.test.ts:49: replaced source.slice(idx, idx + 1400) with the boundary-anchored pattern already used in run-session-error-exit.test.ts — slices to the next if (event.type === " handler so the window scales with the block instead of requiring a magic number bump on every addition. (commit 58e48e420)

Review claims verified false (no change needed)

  • "session_error + error both emitted; confirm no double-count" — verified false as a bug. The two events are intentionally distinct channels: session_error (line 575) fires only for the primary session (props.sessionID === sessionID) and carries classified reason/code/message for structured telemetry and CI consumers; error (line 581) fires for both primary and child sessions and carries the raw error object as the legacy observable. They serve different consumers with different payloads. Added a clarifying comment in run.ts to make this explicit for future readers (commit 58e48e420).

Comment thread packages/cli/test/cli/run-session-error-emit.test.ts Outdated
Comment thread packages/cli/src/cli/cmd/run.ts Outdated
Comment thread packages/cli/test/cli/run-session-error-emit.test.ts Outdated
@aictrl-dev

aictrl-dev Bot commented Jun 22, 2026

Copy link
Copy Markdown

Code review

Verdict: Address the major findings before merging. · 🔴 0 · 🟠 1 · 🟡 1 · ⚪ 1 · 0/3 resolved

  • 🟡 packages/cli/src/cli/cmd/run.ts:580 — session_error can be emitted more than once for one failure
  • packages/cli/test/cli/run-session-error-emit.test.ts:24 — Unexplained --conditions=browser on a CLI subprocess
  • 🟠 packages/cli/test/cli/run-session-error-emit.test.ts:31 — Behavioral test doesn't cover the new code path
🤖 Fix all 3 open findings with your agent
Fix the following code review findings on aictrl-dev/cli PR #81 (head branch).
Run the relevant tests/linters after each change.

1. packages/cli/src/cli/cmd/run.ts:580 — session_error can be emitted more than once for one failure
   Detail: This PR adds a third `session_error` emitter — the in-loop `session.error` handler for the primary session — alongside two pre-existing ones: `loopDone.catch(...)` and the `promptResult` rejection handler (both already call classifySessionError + emit("session_error")). In a mid-run failure it is plausible that a `session.error` event is observed (firing this new emit) AND `promptResult`/`loop()` later rejects (firing an existing emit), producing two `session_error` JSON lines for a single logical failure. Telemetry/CI consumers that count or branch on `session_error` could double-count or act twice.

Consider guarding all three sites with a shared `let sessionErrorEmitted = false` (emit + set on first occurrence), or document the expected at-most-N semantics for consumers.
   Suggested fix: Introduce a single `let sessionErrorEmitted = false` flag in the run scope and short-circuit all three `session_error` emit sites behind it, so at most one `session_error` is emitted per run regardless of which failure paths fire.
2. packages/cli/test/cli/run-session-error-emit.test.ts:24 — Unexplained --conditions=browser on a CLI subprocess
   Detail: The spawned CLI is run under `bun run --conditions=browser`. For a CLI that depends on Node/Bun runtime APIs (process, fs, child_process, real network calls), forcing the `browser` export condition risks resolving browser-shimmed module exports — potentially a mocked/no-op SDK client that does not exercise the real session.error path the test intends to cover, letting the assertion pass for the wrong reason. If the flag is intentional (to pin a specific module resolution), a one-line comment explaining why would help reviewers; if not, drop it so the subprocess runs under default (node) conditions.
   Suggested fix: Drop `--conditions=browser` unless it is intentionally required; if required, add a comment explaining why the browser condition is needed for this CLI subprocess.
3. packages/cli/test/cli/run-session-error-emit.test.ts:31 — Behavioral test doesn't cover the new code path
   Detail: The test is described as a behavioral regression test for #81, but its failure trigger (`--model nonexistent-provider/nonexistent-model`) almost certainly exercises a PRE-EXISTING emitter rather than the new code this PR adds.

The new code (run.ts, in the `session.error` event handler) emits `session_error` when a `session.error` EVENT is observed during the loop — i.e. a session that started successfully and failed mid-execution. A nonexistent model fails earlier: per the base code's own comment in run.ts's `promptResult` rejection handler, "If SessionPrompt.prompt() rejects BEFORE it enters its internal loop() (e.g., Session.get() fails, model not found)" — i.e. it rejects `promptResult`, and that rejection handler ALREADY emits `session_error` (classifySessionError + emit), predating this PR.

Consequence: this test passes on the base branch unchanged; removing or breaking the new run.ts `emit("session_error", …)` would leave it green. It does not actually guard the #81 behavior and gives false confidence.

Fix: trigger an in-execution session.error event (a session that starts then fails mid-run, e.g. a valid provider + revoked/empty key → auth failure during the run) rather than a pre-prompt model-resolution failure, and assert the event originates from the in-loop path. A focused unit test that feeds a `session.error` event into the loop would also cover it deterministically.
   Suggested fix: Drive a real in-execution failure (valid provider + empty/revoked key so the session starts and fails mid-run with a 401/ProviderAuthError), or add a unit test that injects a `session.error` event into the loop and asserts `session_error` is emitted — so the assertion fails if the new in-loop emit is removed.
📋 Out-of-diff findings (3)
Sev Location Finding
🟡 packages/cli/src/cli/cmd/run.ts:580 session_error can be emitted more than once for one failure
packages/cli/test/cli/run-session-error-emit.test.ts:24 Unexplained --conditions=browser on a CLI subprocess
🟠 packages/cli/test/cli/run-session-error-emit.test.ts:31 Behavioral test doesn't cover the new code path

Reviewed 5 files · 0 inline · view all 3 findings ↗


aictrl · AI code review for fast-moving teams · aictrl.dev

Finding 1 (TRUE/FIX): replace subprocess test that used nonexistent-model
(exercised pre-existing promptResult rejection handler, not the new in-loop
emit) with a mock-server --attach test that injects a session.error event
mid-stream. Test now fails if the new in-loop emit("session_error",…) is removed.

Finding 2 (TRUE/FIX): drop --conditions=browser from the test subprocess.
The flag was cargo-culted from the dev script; no packages in this repo export
browser-specific conditions, so it was harmless but unexplained and potentially
misleading. Removed without replacement.

Finding 3 (TRUE/FIX): add a shared `let sessionErrorEmitted = false` flag in
execute() scope. All three session_error emit sites (in-loop session.error
handler, loopDone.catch, promptResult rejection handler) now short-circuit on
the flag so at most one session_error is emitted per run regardless of which
failure paths fire. New test also asserts count === 1.
@byapparov

Copy link
Copy Markdown
Contributor Author

Round-2 code review verdict — PR #81

Analyzed at: 2026-06-22T09:41:03Z
Commit reviewed: 58e48e4
Fix commit: b09fec9
Round: 2

# Severity Location Finding Verdict Action
1 🟠 test/cli/run-session-error-emit.test.ts:31 Behavioral test doesn't cover the new code path TRUE FIX
2 test/cli/run-session-error-emit.test.ts:24 Unexplained --conditions=browser TRUE FIX
3 🟡 run.ts:~580 session_error can be emitted more than once TRUE FIX

Finding 1 — TRUE / FIX

Evidence: The round-1 subprocess test used --model nonexistent-provider/nonexistent-model, which fails at model resolution before the event loop starts. This triggers the pre-existing promptResult rejection handler (run.ts:774), not the new in-loop session.error handler (run.ts:580). Confirmed: removing the new in-loop emit("session_error",…) left the round-1 test green.

Fix: Replaced the subprocess test with a mock-server --attach test. The mock HTTP server serves an SSE stream containing a session.error event with ProviderAuthError followed by a session.status idle event. The CLI receives the session.error event mid-loop, the new in-loop handler fires, and session_error is emitted to stdout. In the --attach path, promptResult stays Promise.resolve() so the promptResult rejection handler cannot fire — the new handler is the only emit site reachable in this scenario.

Verification: Temporarily replaced the in-loop emit("session_error",…) with a no-op — test went red with expect(0).toBeGreaterThan(0). Restored emit — test is green.

Finding 2 — TRUE / FIX

Evidence: --conditions=browser was cargo-culted from package.json's dev script via the round-1 test. No package in this repo exports browser-specific module conditions, so it was harmless but unexplained. In the context of a CLI subprocess test it is actively confusing: reviewers (and the bot) correctly flag that forcing the browser condition could resolve shimmed/no-op exports.

Fix: Removed --conditions=browser from the test subprocess args. The new test runs under default (node) conditions. No behaviour change since no browser conditions are defined.

Finding 3 — TRUE / FIX

Evidence: Three independent session_error emit sites exist in execute():

  1. In-loop session.error event handler (line 580) — fires on session.error SSE event
  2. loopDone.catch() (line 727) — fires if loop() itself throws
  3. promptResult rejection handler (line 774) — fires if sdk.session.prompt() rejects before entering the loop

Sites 1 and 2 can co-fire: a session.error event fires site 1, then if the event stream subsequently throws (e.g. network drop), loop() rejects and fires site 2. Sites 1 and 3 are mutually exclusive in the --attach path (promptResult never rejects) but can co-fire in the local bootstrap path if the prompt call rejects AND a session.error event was already observed.

Fix: Added let sessionErrorEmitted = false in execute() scope. All three emit sites check-and-set the flag before emitting, so at most one session_error is emitted per run. The new test asserts sessionErrorEvents.length === 1.


{
  "schemaVersion": "1",
  "sourcePR": 81,
  "repo": "aictrl-dev/cli",
  "round": 2,
  "analyzedAt": "2026-06-22T09:41:03Z",
  "headSha": "58e48e420",
  "fixSha": "b09fec9d3",
  "findings": [
    { "id": "f1", "severity": "orange", "verdict": "TRUE", "action": "FIX", "location": "test/cli/run-session-error-emit.test.ts:31", "summary": "Behavioral test didn't cover in-loop session.error path; replaced with mock-server attach test" },
    { "id": "f2", "severity": "white", "verdict": "TRUE", "action": "FIX", "location": "test/cli/run-session-error-emit.test.ts:24", "summary": "Removed unexplained --conditions=browser from test subprocess" },
    { "id": "f3", "severity": "yellow", "verdict": "TRUE", "action": "FIX", "location": "run.ts:580", "summary": "Added sessionErrorEmitted guard flag; all 3 emit sites short-circuit so at most one session_error per run" }
  ]
}

@byapparov byapparov merged commit 5d84d0b into main Jun 22, 2026
5 checks passed
@byapparov byapparov deleted the fix/session-error-json branch June 22, 2026 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant